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ABSTRACT 


The  need  for  voice  recognition  users  to  wear  protective 
masks  in  conjunction  with  voice  input  duties  is  becoming  important 
in  both  the  military  and  industrial  communities.  For  example,  the 
Army  is  interested  in  using  voice  recognition  input  for  a  Tactical 
Fire  Control  Computer  System  (TACFIRE)  in  the  field.  It  is 
essential  that  both  the  capability  to  protect  the  user  from  a 
chemical  warfare  environment  and  voice  recognition  accuracy  be 
maintained.  Likewise,  in  some  voice  input  applications,  such  as  a 
command  post,  situations  exist  when  it  is  desirable  to  enter  all 
voice  input  commands  silently.  In  both  examples,  some  type  of 
protective  mask  (i.e.,  gas  mask  or  stenographer's  mask)  would  be 
used   in    conjunction    with    voice    recognition    equipment. 

A  previous  study  tested  an  easily  removable  protective  mask 
called  a  Stenographer's  mask  in  conjunction  with  a  Threshold 
Technology,  Inc.,  T600  voice  recognition  unit.  This  paper  will 
present  the  results  of  the  second  half  of  the  protective  mask  study 
conducted  at  the  Naval  Postgraduate  School.  This  particular 
research  was  conducted  to  investigate  the  recognition  accuracy  of 
a  different  currently  available  voice  recognizer  using  an  Army  gas 
mask.  Of  particular  concern  was  the  need  to  determine  if  the 
algorithms  and  equipment  could  handle  the  expected  voice 
resonance  and  forced  breathing  sounds  associated  with  the 
nonremovable  protective  mask  without  any  significant  degradation 
in  recognition  capability.  This  study  tested  subjects  under 
various    microphone    and    mask    conditions. 
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I.      XNIRQDIICTXQN 

In  recent  years,  voice  technology  has  developed  to  the  extent 
that  basic  systems  have  now  been  used  successfully  in  several 
industrial  and  military  applications.  With  constant  improvements 
being  made  in  the  capabilities  of  voice  recognition  systems,  their 
use  in  a  wider  variety  of  settings  is  already  being  contemplated. 
Within  the  last  year,  two  independent  requests  have  been  made  to 
the  researchers  at  the  Naval  Postgraduate  School  to  investigate  the 
capability  of  voice  recognition  systems  to  be  used  in  conjunction 
with   several   types   of   protective    masks. 

The  first  request  came  from  users  of  a  voice  recognition 
system  located  in  a  Command  Control  and  Communication  (C3) 
center  in  Hawaii.  They  had  successfully  used  one  system  to  run 
daily  computer  queries  on  the  current  status  of  naval  forces.  The 
command  center  staff  was  considering  the  use  of  voice  input  on 
several  terminals  in  order  that  numerous  queries  could  be  made  at 
once.  Furthermore,  the  staff  was  investigating  the  possibility  of 
using  voice  to  operate  a  newly  planned  computer  system  used  to 
generate    graphic    situation    displays. 

One  primary  concern  of  the  staff  was  the  interference  of 
voice  input  to  the  efficient  operation  of  the  command  center.  The 
terminal  presently  used  has  been  isolated  in  a  room  adjacent  to  the 
main  command  center,  but  the  additional  new  equipment  would  be 
placed  directly  in  the  command  center.  Although  recent  research 
(Elster  1980)  showed  that  background  noise  (including  speech)  did 
not  interfere  significantly  with  voice  recognition  accuracy,  little 
research  has  been  conducted  on  the  effectiveness  of  voice  in  larger 
installations  where  several  speakers,  each  operating  a  separate 
recognizer,  may  be  required  to  make  inputs  simultaneously.  It  is 
conceivable  that,  under  those  conditions,  the  speakers  or  operators 
themselves  might  become  confused  by  each  other's  speech,  thus 
perhaps  increasing  input  errors.  Confusing  situations  could  also 
be  created  during  command  briefings  when  the  computerized 
situation  display  is  operated  by  voice.  The  voice  input  might 
interfere  with  the  high  level  briefings  being  conducted  as  the 
displays   are    generated. 

These  two  situations  could  produce  unwanted  effects  on  the 
command  center,  especially  during  crisis  situations.  One  way  to 
reduce  the  effects  of  operator  confusion  and/or  interference  with 
other  command  center  operations  is  to  have  each  speaker  direct  his 
or  her  commands  into  a  mask,  so  that  little  if  any  recognizable 
sound  escapes  into  the  speaker's  immediate  environment.  Hence, 
the  probability  of  disturbing  or  confusing  a  nearby  speaker  may  be 
reduced  markedly,  and  interference  with  other  command  cental 
operations    eliminated. 


The  second  inquiry  about  using  voice  recognition  systems  in 
conjunction  with  masks  came  from  an  Army  field  artillery  group  in 
Oklahoma  and  the  Army's  High  Technology  Testbed  Project  at  Fort 
Lewis,  Washington.  After  a  15  minute  demonstration  of  voice 
input  to  the  Army's  computerized  Tactical  Fire  Control  Computer 
(TACFIRE),  the  group  started  to  formulate  future  voice  input 
requirements  for  TACFIRE.  At  present,  voice  input  is  only  being 
considered  for  use  in  a  mobile  but  fairly  stable  computer  center  at 
the  Division  level.  This  group  of  Army  officers  was  interested  in 
starting  the  research  needed  to  determine  the  feasibility  of  using 
voice  input  at  mobile  terminal  sites  used  by  lower  command  levels 
and  usually  located  in  a  more  hostile  environment  than  the  Division 
level  counterpart.  The  problems  which  must  be  overcome  before 
voice  can  be  determined  effective  in  such  a  stressful  environment 
are  numerous.  The  problems  include  mobility  and  size  of  the 
recognition  unit,  multiple  user  capability,  and  the  ability  to 
operate  in  all  warfare  environments.  The  chemical  warfare 
environment  is  of  particular  concern  since  it  would  require  the  use 
of   voice    recognition    systems   in    conjunction    with    gas    masks. 

Therefore,  the  question  at  issue  is:  How  well  will  current 
voice  recognition  equipment  perform  under  "masked"  conditions 
such  as  those  described  above?  Specifically,  does  the  impressive 
accuracy  rate  ascribed  to  currently  available  voice  recognition 
equipment  suffer  significantly  if  the  user  is  required  to  enter 
utterances  to  the  system  through  a  mask,  as  opposed  to  the 
conventional    "boom"    microphone    mounted    on    a    headset? 

In  order  to  answer  this  question,  two  independent  but  similar 
experiments  were  conducted.  The  first  experiment  used  a 
stenographer's  mask  similar  to  the  one  that  is  envisioned  for  use  in 
the  command  center.  It  is  interesting  to  note  that  the  steno  mask 
tested  is  also  used  by  the  Marine  Corps  to  muffle  voice 
communications  when  operating  close  to  enemy  positions.  The 
results  from  this  experiment  therefore  have  direct  application  to 
both   the    command    center    and    Army   problems. 

The  detailed  results  of  the  stenographer's  mask  experiment 
are  described  in  Poock,  Schwalm  and  Roland,  1982.  The 
experiment  showed  that  the  mask  caused  a  statistically  significant 
increase  in  the  misrecognition  rate.  The  increase  in 
misrecognitions  appeared  to  be  highly  dependent  on  the  experience 
level  of  the  user  with  respect  to  speaking  into  and  using  masks  and 
microphones.  For  the  subjects,  such  as  pilots,  who  considered 
themselves  experienced  mask  and  microphone  users,  the  increase  in 
error  rate  (although  statistically  significant)  was  not  practically 
significant.  The  error  rate  for  this  group  of  subjects  was  still 
under  2%,  and  would  not  degrade  the  efficient  use  of  voice 
recognition    equipment    with    the    stenographer's    mask. 


The  second  experiment,  which  is  the  subject  of  this  research 
report,  was  conducted  for  two  reasons.  The  first  and  most 
important  objective  was  to  establish  whether  or  not  a  more 
restrictive  protective  mask  had  a  large  effect  on  the  recognition 
error  rate.  The  stenographer's  mask  was  manually  held  in  place 
over  the  nose  and  mouth  by  the  subject.  Therefore,  the  mask  was 
easily  removed  to  take  large  breaths  or  for  comfort.  The  more 
restrictive    mask    used    was   an    Army    M24    field    protective    mask. 

The  second  objective  of  the  experiment  was  to  test  a 
different  recognizer  for  its  suitability  with  protective  masks.  A 
direct  comparison  of  two  different  recognizers  was  not  the 
objective,  but  a  different  recognizer  was  chosen  to  investigate  the 
algorithm  differences  which  might  enhance  or  degrade  the 
protective    mask    recognition   rate. 

As  background,  the  recognizer  and  gas  mask  used  will  be 
discussed.  Secondly,  the  experimental  design  will  be  presented, 
followed  by  the  results  of  the  experiment.  The  analyzed  data  left 
some  questions  unanswered,  so  a  short  side  experiment  was 
conducted  to  determine  whether  or  not  further  experimentation  is 
warranted.  This  small  test  will  be  described,  ending  with  the 
conclusions  and  recommendations  of  the  second  half  of  the  study  to 
analyze  the  use  of  protective  masks  and  voice  recognition 
equipment. 
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An  Interstate  Electronics  Corporation  VRT101  Voice 
Recognition  System  was  used.  The  VRT101  is  a  user  dependent, 
discrete  utterance,  board  level  recognizer.  The  recognition  board 
is  accessed  in  conjunction  with  a  Heath  Zenith  Z80  based  8  bit 
microprocessor.  The  recognizer  has  a  100  utterance  capacity. 
One  word  is  reserved  for  a  word  correction  capability  which  leaves 
99  words  available  for  specific  user  definition.  The  correction 
capability  allows  the  user  to  easily  erase,  through  a  voice 
command,    the   results   of   the   last   voice    input   command. 

The  Interstate  system  has  the  capability  to  set  various 
utterance  rejection  levels.  The  first  is  the  reject  threshold  level 
which  is  used  to  discard  words  whose  algorithm  score  is  not  greater 
than  or  equal  to  the  threshold  value  set.  The  reject  level  was  set 
to  85,  as  suggested  by  Interstate  Electronics,  and  this  setting 
caused  only  13  nonrecognitions  throughout  the  entire  experiment. 
The  second  variable  set  was  the  delta  value.  This  value  is  used  to 
reject  an  utterance  when  the  difference  between  its  classification 
score  and  the  runner  up's  classification  score  is  less  than  or  equal 
to  the  selected  value.  Since  there  was  no  data  on  what  an 
appropriate  value  for  this  variable  should  be,  it  was  set  to  zero, 
which  allowed  all  voice  inputs  not  to  be  rejected  because  of  a  close 
runner-up.  Delta  level  data  was  collected  during  the  experiment 
for  each  utterance  for  analysis  purposes.  Therefore,  since  there 
was  such  a  small  number  of  nonrecognitions,  only  the 
misrecognition  data  was  analyzed  and  is  presented  here.  The 
initial   delta   level   analysis   is    also    provided. 

In  order  to  access  the  delta  level  data,  a  program  was  written 
to  pull  the  recognized  word  off  the  parallel  port  of  the  recognition 
board.  This  program  made  use  of  several  FORTRAN  subroutines 
supplied  by  Interstate  Electronics.  A  copy  of  the  program  is  in 
Appendix  A.  To  fully  understand  the  details  of  the  program,  the 
VRT101    system    documentation   should    be    consulted. 

Basically,  the  program  is  designed  to  wait  for  information  on 
the  recognitioner's  parallel  port,  decipher  the  information,  and 
display  the  work  and  delta  value  information  on  the  CRT  screen  for 
manual   data    collection. 

Gas    Mask 

The  gas  mask  used  is  the  M24  field  protective  mask  used  by 
tank  operators  in  the  Army.  This  particular  gas  mask  comes 
equipped  with  an  internally  mounted  microphone  that  is  placed 
directly  in  front  of  the  mouth,  slightly  below  the  user's  bottom  lip. 
The  air  hose  for  the  gas  mask  is  to  the  left  of  the  microphone  and  is 


placed  at  the  same  height.  It  is  believed  that  the  placement  of 
the  microphone  and  air  hose  are  in  the  worst  possible  positions  and 
the  data  collected  should  represent  a  lower  bound.  In  other 
words,  the  recognition  accuracy  would  probably  increase  if  the 
equipment  was  specially  designed  for  use  with  voice  recognition 
equipment. 

The  experiment  tested  two  different  microphones  mounted  in 
the  gas  mask.  The  first  microphone  was  the  original  gas  mask 
microphone,  which  was  an  Electro  Voice,  Inc.,  Microphone  Dynamic 
M118.  No  documentation  was  available  on  its  performance 
characteristics. 

The  second  device  used  was  the  Shure  SM10  noise  cancelling 
pressure  differential  microphone.  Since  this  microphone  works  on 
a  pressure  differential  between  the  top  and  bottom  of  the 
microphone  to  distinguish  surrounding  noise  from  the  utterance, 
special  care  had  to  be  taken  to  mount  the  microphone  properly. 
The  mounting  technique  used  required  that  enough  space  be  left 
open  underneath  the  microphone  to  allow  for  the  pressure 
differential  characteristics  required  for  proper  operation.  This 
resulted  in  the  microphone  being  placed  higher  in  the  microphone 
housing  and  thus  closer  to  the  user's  mouth  than  the  original  gas 
mask  microphone.  This,  unfortunately,  could  not  be  avoided 
without  redesigning  the  entire  mask  assembly  which  was  outside  the 
purview    of   the    experiment. 


III.      E_X_£E.R.iM.E_ttXAIi_£E_2I_£U 

Twelve  subjects  (5  males,  7  females)  participated  in  this 
experiment.  One  female  subject  was  a  volunteer  and  was  an 
experienced  voice  recognition  user.  The  other  11  subjects  were 
Army  enlisted  personnel  assigned  to  Fort  Ord,  California.  All  11 
enlisted  subjects  had  never  seen  voice  recognition  equipment 
before,  and  6  of  the  11  had  little  or  no  interaction  with  computers. 
They  did  not  volunteer  for  the  experiment,  but  were  assigned  to 
participate  in  addition  to  their  normal  military  duties.  Their  ages 
ranged   from    19   to    39,    with    a    median    age    of    23. 

A  6X3X6  mixed  design  with  repeated  measures  on  two  factors 
was  employed  in  this  experiment.  The  first  factor,  order  of  mask 
use,  was  the  between  variable,  and  was  composed  of  the  6  orders  in 
which  all  three  masks  could  be  used  by  each  subject;  subjects  were 
nested  within  this  variable  so  that  three  subjects  received  one  of 
each  of  the  six  possible  "mask"  orders.  This  counterbalancing 
scheme  was  adopted  to  control  any  effects  that  order  of  use  may 
have  contributed  to  the  results.  "Mask  condition  (N  =  No  Mask, 
0  =  Original  Mask,  S  =  Shure  Mask)  was  a  three-level,  within  group 
variable  with  each  subject  performing  under  each  of  the  three 
"mask"  conditions.  Each  subject  also  performed  6  trials  with  each 
mask,  making  trials  the  second  within  group  variable  with  6  levels. 
A    summary   of   the    experimental   design    appears   in    Figure    1. 

The  full  utterance  capacity  of  the  VRT101  system  was  used  in 
the  experiment.  The  word  list  used  contained  utterances 
necessary  to  input  information  to  the  Army's  Tactical  Fire  Control 
System  (TACFIRE).  The  update  fire  unit  message  template  was 
chosen  as  a  typical  TACFIRE  application  and  the  vocabulary  was 
developed  to  fulfill  that  specific  template  requirement.  The 
TACFIRE  application  was  used  because  it  is  the  basis  for  the  gas 
mask  experiment.  The  words  developed  and  used  are  listed  in 
Appendix    B. 

For  training,  each  subject  repeated  the  100  word  vocabulary 
list  in  sequential  order  7  times.  This  was  the  manufacturer's 
suggested  number  of  passes.  Because  of  time  and  subject 
availability,  all  training  passes  took  place  during  one  sitting 
instead  of  training  over  a  longer  period  of  time  as  suggested  by  the 
manufacturer.  Each  subject  trained  the  entire  vocabulary  on 
Monday.  This  took  about  45  minutes.  Immediately  after  training, 
subjects  made  at  least  two  passes  through  the  entire  100  word 
vocabulary  (essentially  a  test  session)  to  identify  any  problems  in 
the  training  of  a  particular  utterance.  When  the  system  produced 
correct  responses  on  those  two  passes,  the  utterance  was 
considered  adequately  trained.  If  errors  occurred,  a  third  pass 
was  made.  If  less  than  two  of  three  passes  of  any  utterance  was 
correct,  that  utterance  was  retrained.  It  should  be  noted  that 
there  were  5  times  during  the  3  week  period  (all  under  the  gas  mask 


conditions)  when  the  test  could  not  be  passed  adequately  for  all  of 
the  words.  In  these  five  cases,  all  words  were  recognized  at  least 
once,  and  the  test  failed  for  a  maximum  of  6  words  for  any  one 
subject   even    after    numerous   tries   at   retraining. 

After  training,  subjects  tested  the  system.  Each  subject  was 
scheduled  to  make  two  passes  through  the  entire  vocabulary  list  on 
each  of  three  successive  days.  These  testing  sessions  were 
administered  on  Tuesday,  Wednesday  and  Thursday  of  the  same  week 
in  which  training  took  place.  Thus,  a  total  of  six  testing  trials 
were  run  for  each  subject  under  each  "mask"  condition.  In  this 
way,  subjects  were  able  to  complete  training  and  testing  on  one 
mask  condition  within  one  week.  The  experiment  ran  for  a  total  of 
three    weeks,    with    one    mask    condition   being    run    each    week. 

The  independent  variable  in  this  study  was  "mask"  condition: 
No  Mask,  where  subjects  trained  and  tested  the  system  using  the 
conventional  "boom"  microphone;  the  Original  Mask,  where  subjects 
trained  and  tested  the  gas  mask  containing  the  standard  microphone 
supplied  by  the  manufacturer;  and  the  Shure  Mask,  where  subjects 
trained  and  tested  the  gas  mask  containing  the  Shure  SM10 
microphone. 

The  dependent  variables  in  this  study  were  misrecognitions. 
There  were  few  nonrecognitions;  therefore,  they  were  not 
considered   in   the    analysis. 

At  the  conclusion  of  the  experiment,  each  subject  was  asked 
to  fill  out  a  questionnaire  designed  to  measure  certain  attitudes 
and  experience  variables  that  the  researchers  felt  might  affect 
performance.      A    copy    of   the    questionnaire    is   in    Appendix    C. 
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► 

► 
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w 

► 

► 

12  ' 

W 

FIGURE  1.   SUMMARY  OF  EXPERIMENTAL  DESIGN 


IV.   ANALYSIS 

All  analyses  were  performed  using  the  SPSS  (Nie,  Hull,  Jenkins, 
Steinbrenner  and  Bent,  1975)  and  BMDP  (Broen,  Engelman,  Frame,  Hill 
Jenurich  and  Toporek,  1981)  statistical  packages.  All  repeated 
measures  analyses  of  variance  procedures  were  performed  using  the 
arcsin  transformation  of  raw  data  to  stabilize  the  variance  of  the 
error  terms  (Neter  and  Wasserman,  1974)  .  The  mean  error  rates  that 
appear  in  the  figures,  however,  are  untransformed.  All  posterior 
tests  for  significance  between  pairs  of  means  were  performed  using 
the  Scheffe  procedures  described  in  Bruning  and  Kintz  (1977) . 

Table  1  presents  the  analysis  of  variance  summary  for 
misrecognitions.  Significant  main  effects  of  mask  condition  (F  = 
8.97,  P  <  .01)  is  evident,  as  is  a  slight  mask  and  trial  interaction 
(F  =  2.16,  P  <  .04).  This  mask-trial  interaction  is  indicated  in  a 
review  of  Figure  2,  where  there  is  a  definite  degradation  in 
performance  for  the  No  Mask  condition  in  week  5  and  6,  but  no 
apparent  degradation  for  the  other  gas  mask  conditions.  It  is 
interesting  to  note  that  a  similar  performance  trend  was  reported 
for  the  No  Mask  condition  in  the  Stenographer's  mask  experiment. 
Although  very  slight,  this  phenomenon  has  not  been  thoroughly 
explained. 

With  regard  to  the  main  effect  of  mask  condition,  a  Scheffe 
test  for  significance  between  pairs  of  means  was  performed.  The 
results  of  these  tests  indicated  a  significant  difference  existed 
between  all  pairs  of  mask  conditions.  Table  2  presents  the 
calculated  95%  confidence  interval  for  the  estimated  misrecognition 
rate  difference  between  all  3  paired  mask  conditions. 

Table  3  presents  a  summary  by  subject  of  the  data  collected. 
The  error  rates,  to  say  the  least,  are  unacceptably  high.  The  No 
Mask  error  rate  of  7.7%  is  extremely  high  for  Interstate 
performance,  but  can  be  explained  by  the  attitude  of  two  of  the 
participants  and  the  new  vocabulary  list  used.  If  subjects  10  and 
11  are  removed  from  the  table  as  outliers,  the  mean  error  rate  for 
the  No  Mask  condition  becomes  5.3%.  This  error  rate  is  consistant 
with  rates  reported  by  Interstate  for  a  full  100  word  vocabulary. 
Furthermore,  there  were  7  words  which  had  an  unusually  high  error 
rate  and  it  is  felt  that  simple  utterance  changes  for  these  words 
would  bring  the  error  rate  down. 

Even  though  there  are  possible  vocabulary  changes  which  might 
improve  recognition,  the  large  increase  in  error  rate  for  the  masked 
condition  can  not  be  easily  pinpointed.  Further  research  must  be 
conducted  to  determine  the  exact  cause  of  the  extremely  high  error 
rates  recorded  for  the  gas  masks.  Possible  causes  include  the 
following  items. 


Source  of  Variance 

DF 

MS 

F 

Order  (0) 

5 

.691 

0.84 

Error 

5 

.818 

- 

Mask  Condition  (M) 

2 

2.542 

8.97** 

M  X  0 

10 

.305 

1.08 

Error 

12 

.283 

- 

Trials  (T) 

5 

.005 

0.31 

T  X  0 

25 

.010 

0.56 

Error 

30 

.018 

- 

M  X  T 

10 

.024 

2.16* 

M  X  T  X  0 

50 

.005 

0.43 

Error 

60 

.011 

- 

.01 
.04 


TABLE  1. 
ANALYSIS  OF  VARIANCE  SUMMARY  TABLE  FOR  TOTAL  ERRORS. 


Mask  pair 

Confidence  interval 

No  mask  -  Shure  mask 
No  mask  -  Original 
Shure  mask  -  Original 

(.08, .10) 
(.11, .13) 
(.019, .04) 

TABLE  2 
95%  CONFIDENCE  INTERVAL  FOR  PERCENT  DIFFERENCE  BETWEEN  MASK  CONDITIONS 


10 


25   J 


20 


15 


10  * 


0 


N 


12      3^56 

Trials 


FIGURE  2.   TOTAL  ERROR  RATES  BY  MASK  CONDITIONS  BY  TRIALS 


11 


Subject 

No  Mask 

Shure  Mask 

Original  mask 

1 

4.50 

13.33 

12.67 

2 

6.67 

27.83 

25.33 

3 

6.17 

5.00 

4.67 

4 

4.50 

16.17 

19.17 

5 

5.33 

5.83 

43.50 

6 

6.83 

35.83 

34.83 

7 

5.67 

11.33 

6.00 

8 

5.83 

15.83 

9.50 

9 

3.50 

7.50 

9.67 

10 

16.67 

16.50 

34.83 

11 

23.17 

19.83 

24.00 

12 

4.00 

28.50 

17.50 

X 

7.74 

16.96 

20.14 

TABLE  3. 

MEAN  TOTAL  ERROR  RATES  (IN  PERCENT)  FOR  MASK  CONDITIONS 

BY  SUBJECT 


Mask  Condition 

Delta  value  =  0 

Delta  value  =  2 

N 
S 
0 

4.5/0 
13.3/0 
12.6/0 

1.17/10.33 
3.67/20.83 
2.67/30.17 

TABLE  4. 
MISRECOGNITION/NONRECOGNITIONS  AT  VARIOUS  DELTA  LEVELS 
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1.  The  front  mounted  microphone  was  placed  extremely 
close  to  and  directly  in  front  of  the  mouth  and  resulted 
in  what  is  believed  to  be  the  worst  possible  position 
for   the    microphone,    especially   for   fricatives. 

2.  The  breathing  hose  connected  to  the  mask  filter  for 
incoming  air  was  placed  directly  next  to  the 
microphone.  This  caused  noise  at  the  beginning  and 
ending  of  words  as  the  user  took  a  breath  immediately 
before    and   after   the    utterance. 

3.  There     is     an     outgoing     air     valve     directly     below     the 

microphone.  This  valve  is  covered  by  a  small  piece  of 
rubber.  As  the  speaker  breathes  out,  the  valve  opens, 
displacing  the  external  protective  piece  of  rubber. 
When  the  valve  closes,  the  piece  of  rubber  falls  back 
over  the  valve  opening.  After  the  mask  has  been  used 
for  a  period  of  time,  a  distinct  popping  sound  is  caused 
by  the  rubber  piece  being  snapped  back  over  the  valve 
opening.  This  sound  could  happen  during  an  utterance, 
immediately  following  the  hard  consonant  sounds,  such 
as    "p"    and    "t". 

4.  User  attitudes  encountered  might  have  an  effect,  not 
only  on  the  use  of  voice  recognition  under  protective 
mask  conditions,  but  the  use  of  the  technology  in 
general.  Some  subjects  became  frustrated  easily  and 
did  not  attempt  to  observe  the  simple  techniques  that 
they  were  taught  for  the  purpose  of  maximizing 
recognition  accuracy.  The  poor  attitude  encountered 
could  be  due  to  the  requirement  of  wearing  the 
uncomfortable  protective  mask,  the  fact  that  the 
experiment  was  a  required  additional  duty  for  the 
subjects,  or  resistance  of  the  subjects  to  accept  the 
new    technology. 

5.  The  mask  was  not  always  adjusted  snuggly  against  the 
user's  face  in  the  user's  attempts  to  make  it  easier  to 
breathe. 

6.  The  Interstate  word  boundary  parameters  might  be 
adjusted  to  facilitate  the  automatic  "chopping"  off  of 
breath  sounds  at  the  beginning  and  ending  of  each 
utterance. 

7.  The  Interstate  algorithms  are  not  suited  for  this 
particular    application. 
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As  a  positive  note,  five  of  the  12  subjects  (Subjects  3,  5,  7, 
8,  &  9)  achieved  some  relatively  acceptable  error  rates  (under 
10%)  for  all  masked  conditions.  Subject  5  had  a  cold  during  the 
original  mask  condition  and  had  an  extremely  high  error  rate.  It 
should  also  be  noted  that  all  five  of  these  subjects  rated 
themselves  as  very  experienced  in  the  use  of  masks  or 
microphones. 

Finally,  the  number  of  misrecognitions  will  be  reduced  or  at 
least  replaced  by  nonrecognitions  if  the  Interstate  delta  value  was 
set  to  other  than  0.  Table  4  summarizes  the  delta  level  data 
collected  for  Subject  1.  The  misrecognition  rate  is  drastically 
reduced,  but  is  replaced  by  an  inordinate  number  of 
nonrecognitions  even  in  the  No  Mask  condition.  This  is  definite 
evidence  of  utterance  similarity  which  hopefully  can  be  reduced  by 
modification   of   the    word   list. 

The  average  delta  value  for  the  No  Mask  condition  was  10.4. 
While  the  average  delta  value  for  the  original  and  Shure  Mask  was 
6.7  and  6.2,  respectively.  This  can  be  interpreted  as  a  real 
problem  with  the  use  of  the  gas  mask  and  voice  recognition 
technology.  The  poor  recognition  rate  achieved  goes  beyond  the 
utterance  list  used  and  the  experience  level  of  the  subject.  For 
the  Interstate  equipment,  there  is  a  definite  degradation  in  the 
algorithm's  ability  to  distinguish  between  utterances  when  the  gas 
mask    is   used. 
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The  previous  section  suggested  numerous  possible 
explanations  for  the  poor  recognition  rate  achieved  in  this 
experiment.  The  majority  of  the  reasons  outlined  concerned  the 
mask  design,  and  user  attitudes  and  procedures.  The  remainder  of 
the  hypothesized  problem  areas  concerned  the  vocabulary  list  that 
was  developed  especially  for  this  experiment  and  the  Intestate 
Electronics   equipment. 

In  order  to  determine  which  hypothesized  problem  area 
contributed  the  most  to  the  error  rate  demonstrated,  several 
independent  experiments  must  be  run,  along  with  the  professional 
development  of  a  proper  mask  apparatus.  Before  any  suggestions 
are  made  that  money  and  time  should  be  spent  on  the  redesign  of 
the  protective  mask,  a  very  quick  pilot  experiment  was  conducted 
to  determine  whether  the  vocabulary  list  and/or  recognition 
equipment    was   a    major    contributor   to    the    error    rate. 

This  pilot  experiment  consisted  of  one  subject  repeating  the 
experiment,  using  the  same  vocabulary  list,  but  using  the 
Threshold  Technology,  Inc.,  Model  T600  recognition  system  instead 
of  the  Innterstate  Electronics  VRT100.  The  subject,  number  9, 
was  the  experienced  recognition  user  who  was  a  volunteer  for  the 
main  experiment.  The  experiment  is  just  a  pilot  study  and  is  not 
presented    as   representing    statistically   significant   results. 

The  results  can  be  simply  stated.  For  the  No  Mask 
condition,  there  were  only  4  errors  (all  misrecognitions)  out  of 
the  600  utterances.  This  represented  a  99.3%  accuracy  rate. 
The  original  mask  and  Shure  mask  conditions  produces  92.7%  and 
91.84%  accuracy  rates,  respectively.  These  accuracy  rates  are 
comparable  to  the  rates  achieved  with  the  Interstate  equipment 
for   the    mask    conditions. 

The  above  pilot  study  leads  the  authors  to  believe  that  the 
vocabulary  list  has  few  inherent  recognition  problems  without  the 
mask.  The  accuracy  rates  achieved  for  the  masked  conditions 
were  very  close  to  an  inefficient  level.  Therefore,  mask  redesign 
appears  to  be  the  next  step,  since  both  recognition  units  had 
severe    problems    when    the    gas    mask    was   used. 


The  results  of  the  present  study  are  not  very  encouraging. 
In  the  first  experiment,  it  is  apparent  that,  although  using  a 
stenographer's  mask  does  contribute  to  an  increase  in  the  percent 
of  misrecognition  errors  made,  this  increase  in  errors  may  be 
mitigated  to  a  large  extent  by  experience  using  masks  or 
microphones.  This  led  the  authors  to  suggest  that,  with 
appropriate  training,   "masked"   speakers  could  achieve  an  accuracy 

15 


rate  comparable  to  "unmasked"  speakers,  using  currently  available 
voice  recognition  equipment.  In  this  second  experiment,  where 
the  masked  condition  is  much  more  severe  since  the  mask  can  not 
be  easily  removed  to  take  a  breath,  much  more  research  needs  to 
be  done.  Experience  also  proved  to  be  an  important  factor,  but 
the  error  rate  even  from  experienced  users  was  higher  than  the 
results  using  the  stenographer's  mask.  Areas  for  further  research 
are  the  placement  of  the  microphone  in  the  mask,  and  variation  of 
word  boundary  parameters  to  help  alleviate  the  breathing  sound 
problems  which  usually  occur  at  the  beginning  and  ending  of  words, 
and    might   be   responsible   for   utterance   similarity. 
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APPENDIX  A 
RECOGNITION  PROGRAMS 


PE  GAS ♦FOR 

DIMENSION  IBUF(28) rN0DE(14) 
IBELL  =  Z'07' 
DO  100  I  =  lrl4 

NODE(I)  =  Z'2020' 
0  CONTINUE 
IER  =  0 
NODEd)  =  'NA' 
N0DE<2)='VY' 
CALL  PRCSYN ( NODE , IER) 
IFdER  .NE.  1)  GO  TO  999 
DO  1000  I TRIAL  ■  1»2 

DO  2000  IUORD  =  IdOOO 
CALL  URMINdDUM) 
CALL  VRMINdDUM) 
CALL  VRMINdUNIO) 
CALL  VRMINdUNl) 

IFdUNlO  .EQ.  70  .AND,  I  UNI  .EQ.  70)  GO  TO  5000 
CALL  VRMINdDSlO) 
CALL  VRMINdDSl) 
CALL  VRMINdSlOO) 
CALL  VRMINdSlO) 
CALL  VRMINdSl) 
CALL  VRMINdRUlO) 
CALL  VRMINdRUl) 
DO  2100  I  =  lr28 

IBUF(I)  =  Z'2020' 
>0        CONTINUE 
ITST  =  47 
IBS  =  1 
IBF  =14 

ASSIGN  150  TO  IDIS 
JO        DO  2200  K  =  IBS f IBF 
CALL  VRMINdBUM) 
IFdDUM  .EQ.  ITST)  GO  TO  IDIS 
IBUF(K)  =  I BUM 
CALL  VRMIN(IDUM) 
IFdDUM  .EQ.  ITST)  GO  TO  IDIS 
IBUF(K)  =  IBUF(K)  f  I BUM 
)0        CONTINUE 

GO  TO  IDIS 
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APPENDIX  A 

RECOGNITION  PROGRAMS 

150  IBS  =  15 
IBF  =  28 
ITST  =  13 

ASSIGN  160  TO  IDIS 
GO  TO  130 
160        IDS  =  (IDS10  -  48)  *  10  +  (IDS1-48) 

IFdWNlO  .EQ.  70  .AND.  IUN1  .EG.  70)  GO  TO  5000 
WRITE ( 3>903)  (IBUF(I) r 1=1, 14) ,  IDS, (IBUF(I) 
GO  TO  2000 
5000        DO  5100  IC  =  1,40 
CALL  VRMIN(IDUM) 
IFdDUM  .EQ.  13)  GO  TO  5200 
5100        CONTINUE 
5200        LJRITE(3,904) 

WRITE<3,905)  IBELL 
2000     CONTINUE 

URITE(3,902) 
1000  CONTINUE 

GO  TO  9999 

999  URITE<3,901) 
9999  STOP 

901  FORMAT( IX , 'ERROR  ENTERING  PARALLEL  MODE'/) 

902  FORMAT ( IX r 'ONE  MORE  PASS  THROUGH  THE  VOCABULARY') 

903  FORMAT ( IX 1 1 4A2  , 3X  , 13  , 3X  , 1 4A2/ ) 

904  FORMAT ( IX , 'NO  MATCH'/) 

905  FORMAT (3X/A2) 
END 
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APPENDIX  B 
VOCABULARY  LIST 


NAVY  r 

A7  DELTA? 

OUT  UNTIL? 

M-91? 

BEARING  SIGHT? 

2? 

A7  ECHO? 

6? 

ECHO  1? 

7? 

3  INCH  50? 

AVAIL  SUPPLY  R 

M--102? 

M-110? 

GRID  ZONE? 

XM-740? 

M-10?? 

AMMUNITION  UPC 

RADIATION? 

REINFORCING? 

RATE? 


•DATE? 


HONEST  JOHN? 

5  INCH  33? 
NUCLEAR? 
DIRECT  SUPPORT? 
M-114? 

6  INCH  47? 
'5  INCH  54? 
A4  MIKE? 
0? 

FORCE  SUPPORTED? 

HIGH  EXPLOSIVE? 

A4  ECHO? 

F4  JULIETT? 

1? 

SITUATION  REPORT? 

SPHEROID? 

M--107? 

105  MILLIMETER? 

HERCULES? 

REINFORCED  UNITS? 

REACTION  TIME? 

ALL  WEAPON  TYPES? 

XM-752? 

DAY? 

GENERAL  SUPPORT? 

CRITICAL  AMMUNITION? 

F4  DELTA? 

MINIMUM  RANGE? 

175  MILLIMETER? 
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appendix  b 

basic  load? 
chemical; 

A6  ECHO? 

cannon; 

pershing? 

return; 

a7  charlie? 

minute; 

delete  request? 

response  time; 

update  fire  unit? 

3200  sight; 

lance; 

a? 

3  inch; 

M-108; 

nuclear  report; 
missile  rocket; 
weapon  strength? 
6400  sight; 

3  INCH  35? 

NONNUCLEAR  REPORT? 

GENERAL? 

4? 

BUILD  A  PLAN? 

F4  BRAUO? 

A6  ALPHA? 

COORDINATES? 

13? 

ALPHA  2? 

A-  10? 

HOUR? 

WEAPON  TYPE? 

A I R  ? 

ir*  * 
,J  * 

F4  CHARLIE? 

AZIMUTH? 

USER  COMMANDS? 

155  MILLIMETER? 

LAUNCH  SITE  UPDATE? 

F105? 

F4  echo; 
alpha; 

ALL  ? 

?? 

A4  FOXTROT? 

Fill? 

ALPHA  ONE? 

READY? 

COORDINATE  EAST? 
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APPENDIX  C 
QUESTIONNAIRE 

ON  THE  FOLLOWING  PAGES  YOU  WILL  FIND 
SEVERAL  QUESTIONS/STATEMENTS  DESIGNED  TO 
GET  YOUR  REACTIONS  TO  USING  VOICE  RECOG- 
NITION EQUIPMENT.   ALSO,  THERE  ARE 
QUESTIONS  REGARDING  YOUR  EXPERIENCE  WITH 
VARIOUS  INPUT  DEVICES. 


PLEASE  RESPOND  TRUTHFULLY ,  AND  CHECK  YOUR 
QUESTIONNAIRE  AFTER  COMPLETION  TO  MAKE  SURE 
YOU'VE  ANSWERED  ALL  THE  ITEMS. 


THANK  YOU  FOR  YOUR  COOPERATION  AND  PARTICIPATION 
IN  THIS  EXPERIMENT. 
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HOW  MUCK  EXPERIENCE  HAVE  YOU  HAD  IN  USING  MASKS  (NOT  INCLUDING 
THIS  EXPERIMENT) ? 

none  some  a  lot 


HOW  MUCH  EXPERIENCE  HAVE  YOU  HAD  IN  SPEAKING  INTO  MICROPHONES 
(NOT  INCLUDING  THIS  EXPERIMENT) . 

none  some  a  lot 


HOW  USEFUL  DO  YOU  THINK  VOICE  RECOGNITION  EQUIPMENT  REALLY  IS? 


not  at  all 

somewhat 

very 

useful 

useful 

useful 

_ i t. 

i i. 

HOW  MUCH  DO  YOU  LIKE  VOICE  RECOGNITION  EQUIPMENT? 

don't  like  it      like  it        like  it 
at  all         somewhat      very  much 
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PLEASE  INDICATE  THE  EXTENT  TO  WHICH  YOU  AGREE  OR  DISAGREE  WITH 
THE  FOLLOWING  STATEMENTS: 

"I  WOULD  DO  BETTER  WITH  VOICE  EQUIPMENT  IF  I  DIDN'T  SEE  OR  HEAR 
WHEN  I'VE  MADE  AN  ERROR." 

disagree     neither  agree     agree 
strongly     nor  disagree    strongly 


"MAKING  ERRORS  WHEN  USING  VOICE  EQUIPMENT  IS  FRUSTRATING." 

disagree     neither  agree     agree 
strongly     nor  disagree    strongly 


'I  FEEL  PRESSURED  WHEN  USING  VOICE  EQUIPMENT." 

disagree     neither  agree     agree 
strongly     nor  disagree    strongly 


"VOICE  EQUIPMENT  IS  TOO  HARD  TO  USE." 


disagree 
strongly 


neither  agree 
nor  disagree 


agree 
strongly 
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"VOICE  EQUIPMENT  IS  IMPRACTICAL." 

disagree     neither  agree     agree 
strongly     nor  disagree     strongly 
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